Engineering posts about Service Discovery

Curated summaries and key learnings for engineers working with Service Discovery.

Rethinking Distributed Systems for Serverless Performance and Reliability

The article explores the evolution of serverless compute for Apache Spark, addressing long-standing architectural challenges that have hindered performance and reliability. It emphasizes the need for...

Slack

15m

From SSH to REST: A Security-Driven Modernization of Slack’s EMR Data Pipelines

The article outlines Slack's transition from a legacy SSH-based architecture to a modern REST-based job submission system for its data pipelines. Initially, the reliance on SSH created significant...

Cloudflare

12m

Rearchitecting the Workflows control plane for the agentic era

The article discusses the rearchitecting of the Workflows control plane to accommodate a shift towards agent-triggered workflows, necessitated by the increasing demand for durable execution engines...

Salesforce

Building a Distributed Persistent Queue That Scaled AI Workloads 5x Under LLM Rate Limits

The article discusses the engineering of a distributed persistent queue that orchestrates AI workloads and human workflows within strict infrastructure limits. It highlights the challenges of scaling...

Databricks

Zero-Downtime Patching in Lakebase Part 1: Prewarming

The article discusses the challenges associated with planned maintenance in database systems, particularly focusing on the performance degradation caused by cold restarts. It introduces Lakebase's...

Databricks

Multi-Cloud Challenges, Intelligent Load Balancing, and AI-Powered Workflows: Databricks at SRECon 2026

The article highlights Databricks' advancements in infrastructure reliability and efficiency as presented at SRECon 2026. It delves into the challenges of multi-cloud operations, particularly...

Atlassian

13m

Engineering posts about Service Discovery

Rethinking Distributed Systems for Serverless Performance and Reliability

From SSH to REST: A Security-Driven Modernization of Slack’s EMR Data Pipelines

Rearchitecting the Workflows control plane for the agentic era

Building a Distributed Persistent Queue That Scaled AI Workloads 5x Under LLM Rate Limits

Zero-Downtime Patching in Lakebase Part 1: Prewarming

Multi-Cloud Challenges, Intelligent Load Balancing, and AI-Powered Workflows: Databricks at SRECon 2026

Scaling Jira cloud Migrations, One Bottleneck at a Time

How Data 360 Optimized Kubernetes Scheduling Architecture, Delivering 13% Cost Savings

How we rebuilt the search architecture for high availability in GitHub Enterprise Server

Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters

Welcoming Stately Cloud to Databricks: Investing in the Foundation for Scalable AI Applications